Scalable Root-Cause Analysis
نویسندگان
چکیده
From May 2 to May 7, 2010, the Dagstuhl Seminar 10181 Program Development for Extreme-Scale Computing was held in Schloss Dagstuhl Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. Links to extended abstracts or full papers are provided, if available.
منابع مشابه
Root-cause analysis for time-series anomalies via spatiotemporal causal graphical modeling
Modern distributed cyber-physical systems encounter a large variety of anomalies and in many cases, they are vulnerable to catastrophic fault propagation scenarios due to strong connectivity among the sub-systems. In this regard, root-cause analysis becomes highly intractable due to complex fault propagation mechanisms in combination with diverse operating modes. This paper presents a new data-...
متن کاملRoot Cause and Error Analysis
Error is an inevitable part of life and cannot be completely eliminated, but it can be minimized. A root cause analysis is a technique for understanding the systematic error causes that is involved beyond a person or people to implement an errors and including field and environmental causes of errors when occur in this situation too. An important factor of an error occurrence is a root cause (c...
متن کاملScalable Approach to Failure Analysis of High - Performance Computing Systems
© 2014 Doaa Shawky 1023 http://dx.doi.org/10.4218/etrij.14.0113.1133 Failure analysis is necessary to clarify the root cause of a failure, predict the next time a failure may occur, and improve the performance and reliability of a system. However, it is not an easy task to analyze and interpret failure data, especially for complex systems. Usually, these data are represented using many attribut...
متن کاملALACA: A platform for dynamic alarm collection and alert notification in network management systems
Funding information TUBITAK TEYDEB 1501, Grant/Award Number: 3130411 Summary Mobile network operators run Operations Support Systems that produce vast amounts of alarm events. These events can have different significance levels and domains and also can trigger other ones. Network operators face the challenge to identify the significance and root causes of these system problems in real time and ...
متن کاملA comprehensive TCP fairness analysis in high speed networks
The short-term dynamics of competing high speed TCP flows could have surprising impacts on their long-term fairness. As a result, this could have a severe impact on the co-existence and, finally, the deployment feasibility of different seemingly promising proposals for the next generation networks. However, to our best knowledge, no root-cause analysis of the observation is available. This is t...
متن کامل